

# Integrated Retiming and Simultaneous Vdd/Vth Scaling for Total Power Minimization



Mongkol Ekpanyapong

Advisor: Prof. Sung Kyu Lim

School of Electrical and Computer Engineering
Georgia Institute of Technology





#### **Outline**



- Introduction and Motivation
- Related Work
- Methodology
- Experimental Results
- Conclusions





#### Introduction

 Both static and dynamic power are the important issue in deep submicron design

Performance is important issue

The objective of this work is to minimize total power consumption while maintain the target clock period







# **Retiming Algorithm**

- Linear Programming
  - Can easily be modified to handle any linear objective

- Bellman-Ford Algorithm
  - Can handle large circuits





#### **Power Minimization**

Minimize total number of Flip-flop to reduce flip-flop power

Using dual Vdd and Vth to minimize static and dynamic power



#### **Outline**



- Introduction and Motivation
- Related Work
- Methodology
- Experimental Results
- Conclusions





### Retiming and Voltage Scaling

 C. E. Leiserson and J. B. Saxe, "Retiming synchronous circuitry," Algorithmica 1991

 K. Usami and M. Horowitz, "Clustered Voltage Scaling Technique for Low-Power Design", ISLPED 1995

N. Chabini and W. Wolf, "Reducing Dynamic Power Consumption in Synchronous Sequential Digital Designs Using Retiming and Supply Voltage Scaling," TVLSI 2004



#### **Outline**



- Introduction and Motivation
- Related Work
- Methodology
- Experimental Results
- Conclusions





# **Power Minimization with Retiming**







#### Objective-

Minimize the number of flip-flops (FF.)

#### Minimize $\{FI(v) - FO(v)\} \cdot r(v)$ (53)

Subject to:

$$r(u) - r(v) \le w(e_{u,v}), \ \forall e_{u,v} \in E \quad (54)$$

$$r(u) - r(v) \le W(u, v) - 1, \ \forall D(u, v) > L, \ \forall u, v \in V$$
 (55)

#### Constraints:

- Num. FF. has to be satisfied r(u) ≤ w(e<sub>u,v</sub>) + r(v)
- Num. FF. on critical paths has to be greater than zero



V is the set of gates and E is the set of edges.  $v \in V$  and  $e \in E$  r(v) is the number of FF. moved from fanout of node v to fanin of node v  $w(e_{u,v})$  is the FF. count on edge u,v,





#### Objective-

Minimize the number of flip-flops (FF.)

#### Minimize $\{FI(v) - FO(v)\} \cdot r(v)$ (53)

Subject to:

$$r(u) - r(v) \le w(e_{u,v}), \ \forall e_{u,v} \in E \quad (54)$$

$$r(u) - r(v) \le W(u, v) - 1, \ \forall D(u, v) > L, \ \forall u, v \in V$$
 (55)

#### Constraints:

Num. FF. has to be satisfied r(u) ≤ w(e<sub>u,v</sub>) + r(v)

Num. FF. on critical paths has to be greater than zero



V is the set of gates and E is the set of edges.  $v \in V$  and  $e \in E$  r(v) is the number of FF. moved from fanout of node v to fanin of node v  $w(e_{u,v})$  is the FF. count on edge u,v,





#### Objective:

Minimize the number of flip-flops (FF.)

#### Minimize $\{FI(v) - FO(v)\} \cdot r(v)$ (53)

Subject to:

$$r(u) - r(v) \le w(e_{u,v}), \ \forall e_{u,v} \in E \quad (54)$$

$$r(u) - r(v) \le W(u, v) - 1, \ \forall D(u, v) > L, \ \forall u, v \in V$$
 (55)

#### Constraints:

- Num. FF. has to be satisfied
  - $r(u) \le w(e_{u,v}) + r(v)$
- Num. FF. on critical paths has to be greater than zero

Only these 2 FF. can move out of u



V is the set of gates and E is the set of edges.  $v \in V$  and  $e \in E$  r(v) is the number of FF. moved from fanout of node v to fanin of node v  $w(e_{u,v})$  is the FF. count on edge u,v,





#### Objective:

Minimize the number of flip-flops (FF.)

#### $Minimize \{FI(v) - FO(v)\} \cdot r(v)$ (53)

Subject to:

$$r(u) - r(v) \le w(e_{u,v}), \ \forall e_{u,v} \in E \quad (54)$$

$$r(u) - r(v) \le W(u, v) - 1, \ \forall D(u, v) > L, \ \forall u, v \in V \quad (55)$$

#### Constraints:

- Num. FF. has to be satisfied r(u) ≤ w(e<sub>u,v</sub>) + r(v)
- Num. FF. on critical paths has to be greater than zero

$$r(1)-r(3) \le 0 \quad \Rightarrow \quad r(1) \le r(3)$$

Cycle Time (L) =2

$$D(1,2) = 2$$

$$D(1,3) = 3$$

$$D(2,3) = 2$$

$$W(1,2) = 0$$

$$W(1,3) = 1$$

$$W(2,3) = 1$$

V is the set of gates and E is the set of edges.  $v \in V$  and  $e \in E$  r(v) is the number of FF. moved from fanout of node v to fanin of node v  $w(e_{u,v})$  is the FF. count on edge u,v,





#### Objective:

Minimize the number of flip-flops (FF.)

#### Minimize $\{FI(v) - FO(v)\} \cdot r(v)$ (53)

Subject to:

$$r(u) - r(v) \le w(e_{u,v}), \ \forall e_{u,v} \in E \quad (54)$$

$$r(u) - r(v) \le W(u, v) - 1, \ \forall D(u, v) > L, \ \forall u, v \in V$$
 (55)

#### Constraints:

Num. FF. has to be satisfied r(u) ≤ w(e<sub>u,v</sub>) + r(v)

Num. FF. on critical paths has to be greater than zero

$$r(1)-r(3) \le 0 \quad \Rightarrow \quad r(1) \le r(3)$$

$$D(1,2) = 2$$

$$D(1,3) = 3$$

$$D(2,3) = 2$$

$$W(1,2) = 0$$

$$W(1,3) = 1$$

$$W(2,3) = 1$$

V is the set of gates and E is the set of edges.  $v \in V$  and  $e \in E$  r(v) is the number of FF. moved from fanout of node v to fanin of node v  $w(e_{u,v})$  is the FF. count on edge u,v,



#### Non-critical Gates for Power Minimization



Non-critical gates: What should we do?

We can use the voltage scaling for non-critical gates after retiming to minimize total power consumption





# Low-to-High V<sub>dd</sub> Conversion

Level Converter (LC) requirement







Objective:

Minimize gate power + LC power

Constraints:

Each gate has to be assigned to only one voltage state

Arrival time + gate delay of each node ≤ target clock period

Level converter inserted if low V<sub>dd</sub> node drives high V<sub>dd</sub> node

MILP Formulation

Minimize 
$$\left(\sum_{v \in V} \sum_{k=1}^{4} p_{v,k} \cdot x_{v,k}\right) + \left(\sum_{e \in E} p_{lc} \cdot m_e\right)$$
 (58)

Subject to:

$$\sum_{k=1}^{4} x_{v,k} = 1, \forall v \in V$$
(59)

Timing constraints:

$$\sum_{k=1}^{4} d_{v,k} \cdot x_{v,k} + s(v) \le L, \ \forall v \in V$$
 (60)

$$\sum_{k=1}^{4} d_{u,k} \cdot x_{u,k} + d_{lc} \cdot m(e) + s(u) \le s(v), \forall e_{u,v} \in E \quad (61)$$

$$s(v) \ge 0, \forall v \in V$$
 (62)

Level converter (LC) constraints:

$$\sum_{i=1}^{4} z_{u,i} \cdot x_{u,i} - \sum_{j=1}^{4} z_{v,j} \cdot x_{v,j} + D \cdot m(e) \ge 0, \ \forall e \in E$$
 (63)

$$x_{v,k} \in \{0,1\}, \forall v \in V$$
 (64)

$$m(e) \in \{0, 1\}, \forall e \in E$$
 (65)





MILP Formulation

Minimize 
$$\left(\sum_{v \in V} \sum_{k=1}^{4} p_{v,k} \cdot x_{v,k}\right) + \left(\sum_{e \in E} p_{lc} \cdot m_e\right)$$
 (58)

Subject to:

$$\sum_{k=1}^{4} x_{v,k} = 1, \ \forall v \in V$$
 (59)

Timing constraints:

$$\sum_{k=1}^{4} d_{v,k} \cdot x_{v,k} + s(v) \le L, \forall v \in V \quad (60)$$

$$\sum_{k=1}^{4} d_{u,k} \cdot x_{u,k} + d_{lc} \cdot m(e) + s(u) \le s(v), \ \forall e_{u,v} \in E$$
 (61)

$$s(v) \ge 0, \forall v \in V$$
 (62)

Level converter (LC) constraints:

$$\sum_{i=1}^{4} z_{u,i} \cdot x_{u,i} - \sum_{j=1}^{4} z_{v,j} \cdot x_{v,j} + D \cdot m(e) \ge 0, \ \forall e \in E \quad (63)$$

$$x_{v,k} \in \{0,1\}, \forall v \in V$$
 (64)

$$m(e) \in \{0, 1\}, \forall e \in E$$
 (65)







$$(v)$$
  $V_{dd}$  Low  $V_{th}$  High  $(x_{v,1}=1)$ 





MILP Formulation

Minimize 
$$\left(\sum_{v \in V} \sum_{k=1}^{4} p_{v,k} \cdot x_{v,k}\right) + \left(\sum_{e \in E} p_{lc} \cdot m_e\right)$$
 (58)

Subject to:

$$\sum_{k=1}^{4} x_{v,k} = 1, \forall v \in V$$
(59)

Timing constraints:

$$\sum_{k=1}^{4} d_{v,k} \cdot x_{v,k} + s(v) \le L, \forall v \in V \quad (60)$$

$$\sum_{k=1}^{3} d_{u,k} \cdot x_{u,k} + d_{lc} \cdot m(e) + s(u) \le s(v), \ \forall e_{u,v} \in E$$
 (61)

$$s(v) \ge 0, \forall v \in V$$
 (62)

Level converter (LC) constraints:

$$\sum_{i=1}^{4} z_{u,i} \cdot x_{u,i} - \sum_{j=1}^{4} z_{v,j} \cdot x_{v,j} + D \cdot m(e) \ge 0, \ \forall e \in E \quad (63)$$

$$x_{v,k} \in \{0,1\}, \forall v \in V$$
 (64)

$$m(e) \in \{0, 1\}, \forall e \in E$$
 (65)

$$s(u) = 0$$
  
  $d(u) = 1$   $s(v) = 1$ 







MILP Formulation

Minimize 
$$\left(\sum_{v \in V} \sum_{k=1}^{4} p_{v,k} \cdot x_{v,k}\right) + \left(\sum_{e \in E} p_{lc} \cdot m_e\right)$$
 (58)

Subject to:

$$\sum_{k=1}^{4} x_{v,k} = 1, \forall v \in V$$
(59)

Timing constraints:

$$\sum_{k=1}^{4} d_{v,k} \cdot x_{v,k} + s(v) \le L, \ \forall v \in V$$
 (60)

$$\sum_{k=1}^{\infty} d_{u,k} \cdot x_{u,k} + d_{lc} \cdot m(e) + s(u) \le s(v), \ \forall e_{u,v} \in E$$
 (61)

$$s(v) \ge 0, \forall v \in V$$
 (62)

Level converter (LC) constraints:

$$\sum_{i=1}^{4} z_{u,i} \cdot x_{u,i} - \sum_{j=1}^{4} z_{v,j} \cdot x_{v,j} + D \cdot m(e) \ge 0, \ \forall e \in E \quad (63)$$

$$x_{v,k} \in \{0,1\}, \forall v \in V$$
 (64)

$$m(e) \in \{0, 1\}, \forall e \in E$$
 (65)

$$s(u) = 0$$

$$d(u) = 1$$

$$s(v) = 2$$









MILP Formulation

Minimize 
$$\left(\sum_{v \in V} \sum_{k=1}^{4} p_{v,k} \cdot x_{v,k}\right) + \left(\sum_{e \in E} p_{lc} \cdot m_e\right)$$
 (58)

Subject to:

$$\sum_{k=1}^{4} x_{v,k} = 1, \forall v \in V$$
(59)

Timing constraints:

$$\sum_{k=1}^{4} d_{v,k} \cdot x_{v,k} + s(v) \le L, \forall v \in V$$
 (60)

$$\sum_{k=1}^{4} d_{u,k} \cdot x_{u,k} + d_{lc} \cdot m(e) + s(u) \le s(v), \ \forall e_{u,v} \in E$$
 (61)

$$s(v) \ge 0, \forall v \in V$$
 (62)

Level converter (LC) constraints:

$$\sum_{i=1}^{4} z_{u,i} \cdot x_{u,i} - \sum_{j=1}^{4} z_{v,j} \cdot x_{v,j} + D \cdot m(e) \ge 0, \ \forall e \in E \quad (63)$$

Integer constraints:

$$x_{v,k} \in \{0,1\}, \forall v \in V$$
 (64)

$$m(e) \in \{0, 1\}, \forall e \in E$$
 (65)

Cycle time (L) = 2

$$s(u) = 0 \qquad s(v) = 1$$

$$d(u) = 1$$
  $d(v) = 1$ 



$$s(u) + d(u) \le 2$$
  $s(v) + d(v) \le 2$ 





MILP Formulation

Minimize 
$$\left(\sum_{v \in V} \sum_{k=1}^{4} p_{v,k} \cdot x_{v,k}\right) + \left(\sum_{e \in E} p_{lc} \cdot m_e\right)$$
 (58)

Subject to:

$$\sum_{k=1}^{4} x_{v,k} = 1, \forall v \in V$$
(59)

Timing constraints:

$$\sum_{k=1}^{4} d_{v,k} \cdot x_{v,k} + s(v) \leq L, \forall v \in V \quad (60)$$

$$\sum_{k=1}^{4} d_{u,k} \cdot x_{u,k} + d_{lc} \cdot m(e) + s(u) \le s(v), \ \forall e_{u,v} \in E$$
 (61)

$$s(v) \ge 0, \forall v \in V$$
 (62)

Level converter (LC) constraints:

$$\sum_{i=1}^{4} z_{u,i} \cdot x_{u,i} - \sum_{j=1}^{4} z_{v,j} \cdot x_{v,j} + D \cdot m(e) \ge 0, \ \forall e \in E \quad (63)$$

$$x_{v,k} \in \{0,1\}, \forall v \in V$$
 (64)

$$m(e) \in \{0, 1\}, \forall e \in E$$
 (65)





#### Convert from ILP to LP



Assume only two states for illustration purpose



# Gradient Search Algorithm for LC Relaxation





# Gradient Search Algorithm for LC Relaxat







### **Voltage Assignment**

#### Four possible voltage assignment:

- High V<sub>dd</sub>, low V<sub>th</sub> node
   Fastest gate, high dynamic power, high leakage power
- High V<sub>dd</sub>, high V<sub>th</sub> node
   High dynamic power, low leakage power
- Low V<sub>dd</sub>, low V<sub>th</sub> node
  Low dynamic power, high leakage power
- Low V<sub>dd</sub>, high V<sub>th</sub> node
  Slowest gate, low dynamic power, low leakage power



# Possible Supply Voltage Assignment

#### Feasible Solution

#### Infeasible Solution



















```
Voltage Mapping Algorithm
input: LP-based voltage scaling with LC inserted
output: ILP-based voltage scaling with reduced LC set

    T = topological ordering of gates;

 assign low-V<sub>dd</sub>+high-V<sub>th</sub> to all PIs;

while (T is not empty)
    v = T.pop;
5. dly(v) = \sum_{k=1}^{4} x_{v,k} \cdot d_{v,k};
6. vdd(v) = x_{v,1} + x_{v,3};
7. v \leftarrow V_{dd}-L+V_{th}-H;
//V_{dd} mapping
       if (\exists u \in FI(v) | m(e_{u,v}) = 1)
      v \leftarrow V_{dd}-H;
      if (vad(v) > 0)
10.
           v \leftarrow V_{dd}-H:
11.
// LC removal
12. if (\exists u \in FI(v)|u = V_{dd}\text{-H \& }m(e_{u,v}) = 1 \text{ or }
        u = V_{dd}-L & m(e_{u,v}) = 1 and v = V_{dd}-L)
        m(e_{u,v}) \leftarrow 0
13.
//V_{th} mapping
     if (v = V_{dd}-H & dly(v) < delay(V_{dd}-H+V_{th}-H))
15. v \leftarrow V_{th}-L;
16. if (v = V_{dd}\text{-L } \& dly(v) < delay(V_{dd}\text{-L}+V_{th}\text{-H}))
17. v \leftarrow V_{th}-L;
```



- V) low Vdd
- 🕠 high Vdd



```
Voltage Mapping Algorithm
input: LP-based voltage scaling with LC inserted
output: ILP-based voltage scaling with reduced LC set

    T = topological ordering of gates;

 assign low-V<sub>dd</sub>+high-V<sub>th</sub> to all PIs;

while (T is not empty)
    v = T.pop;
5. dly(v) = \sum_{k=1}^{4} x_{v,k} \cdot d_{v,k};
6. vdd(v) = x_{v,1} + x_{v,3};
7. v \leftarrow V_{dd}-L+V_{th}-H;
//V_{dd} mapping
        if (\exists u \in FI(v) | m(e_{u,v}) = 1)
10. \int if (vdd(v) > 0)
           v \leftarrow V_{dd}-H;
// LC removal
12. if (\exists u \in FI(v)|u = V_{dd}\text{-H \& }m(e_{u,v}) = 1 \text{ or }
        u = V_{dd}-L & m(e_{u,v}) = 1 and v = V_{dd}-L)
         m(e_{n,n}) \leftarrow 0
13.
//V_{th} mapping
     if (v = V_{dd}-H & dly(v) < delay(V_{dd}-H+V_{th}-H))
14.
15. v \leftarrow V_{th}-L;
16. if (v = V_{dd}\text{-L} \& dly(v) < \text{delay}(V_{dd}\text{-L}+V_{th}\text{-H}))
17. v \leftarrow V_{th}-L;
```



- (VL) low Vdd
- 🕠 high Vdd



```
Voltage Mapping Algorithm
input: LP-based voltage scaling with LC inserted
output: ILP-based voltage scaling with reduced LC set

    T = topological ordering of gates;

 assign low-V<sub>dd</sub>+high-V<sub>th</sub> to all PIs;

while (T is not empty)
    v = T.pop;
5. dly(v) = \sum_{k=1}^{4} x_{v,k} \cdot d_{v,k};
6. vdd(v) = x_{v,1} + x_{v,3};
7. v \leftarrow V_{dd}-L+V_{th}-H;
//V_{dd} mapping
8. if (\exists u \in FI(v) | m(e_{u,v}) = 1)

 v ← V<sub>dd</sub>-H;

10. if (vdd(v) > 0)
     v \leftarrow V_{dd}-H;
11.
// LC removal
12. If (\exists u \in FI(v)|u = V_{dd} - H \& m(e_{u,v}) = 1 or
        u = V_{dd}-L & m(e_{u,v}) = 1 and v = V_{dd}-L)
13.
//V_{th} mapping
     if (v = V_{dd}-H & dly(v) < delay(V_{dd}-H+V_{th}-H))
15. v \leftarrow V_{th}-L;
16. if (v = V_{dd}\text{-L} \& dly(v) < \text{delay}(V_{dd}\text{-L}+V_{th}\text{-H}))
17. v \leftarrow V_{th}-L;
```





```
Voltage Mapping Algorithm
input: LP-based voltage scaling with LC inserted
output: ILP-based voltage scaling with reduced LC set

    T = topological ordering of gates;

 assign low-V<sub>dd</sub>+high-V<sub>th</sub> to all PIs;

while (T is not empty)
    v = T.pop;
5. dly(v) = \sum_{k=1}^{4} x_{v,k} \cdot d_{v,k};
6. vdd(v) = x_{v,1} + x_{v,3};
7. v \leftarrow V_{dd}-L+V_{tb}-H;
//V_{dd} mapping
8. if (\exists u \in FI(v) | m(e_{u,v}) = 1)

 v ← V<sub>dd</sub>-H;

10. if (vdd(v) > 0)
11. v \leftarrow V_{dd}-H;
// LC removal
12. if (\exists u \in FI(v)|u = V_{dd}\text{-H \& }m(e_{u,v}) = 1 \text{ or }
        u = V_{dd}-L & m(e_{u,v}) = 1 and v = V_{dd}-L)
13.
        m(e_{u,v}) \leftarrow 0
//V_{th} mapping
       if (v = V_{dd}-H & dly(v) < delay(V_{dd}-H+V_{th}-H))
15.
         v \leftarrow V_{tb}-L:
16. if (v = V_{dd}\text{-L} \& dly(v) < \text{delay}(V_{dd}\text{-L}+V_h\text{-H}))
17.
       v \leftarrow V_{th}-L;
```

#### Assigned V<sub>dd</sub>High to V

$$Slk = 2.2$$



high  $V_{dd}$  low  $V_{th}$  high  $V_{dd}$  high  $V_{th}$ 



```
Voltage Mapping Algorithm
input: LP-based voltage scaling with LC inserted
output: ILP-based voltage scaling with reduced LC set

    T = topological ordering of gates;

 assign low-V<sub>dd</sub>+high-V<sub>th</sub> to all PIs;

while (T is not empty)
    v = T.pop;
5. dly(v) = \sum_{k=1}^{4} x_{v,k} \cdot d_{v,k};
6. vdd(v) = x_{v,1} + x_{v,3};
7. v \leftarrow V_{dd}-L+V_{tb}-H;
//V_{dd} mapping
8. if (\exists u \in FI(v) | m(e_{u,v}) = 1)

 v ← V<sub>dd</sub>-H;

10. if (vdd(v) > 0)
11. v \leftarrow V_{dd}-H;
// LC removal
12. if (\exists u \in FI(v)|u = V_{dd}\text{-H \& }m(e_{u,v}) = 1 \text{ or }
        u = V_{dd}-L & m(e_{u,v}) = 1 and v = V_{dd}-L)
13.
        m(e_{u,v}) \leftarrow 0
//V_{th} mapping
       if (v = V_{dd}-H & dly(v) < delay(V_{dd}-H+V_{th}-H))
15.
         v \leftarrow V_{tb}-L:
16. if (v = V_{dd}\text{-L} \& dly(v) < \text{delay}(V_{dd}\text{-L}+V_h\text{-H}))
17.
       v \leftarrow V_{th}-L;
```

#### Assigned V<sub>dd</sub>High to V

$$Slk = 1.5$$



high  $V_{dd}$  low  $V_{th}$  high  $V_{dd}$  high  $V_{th}$ 



#### Voltage Mapping Algorithm

input: LP-based voltage scaling with LC inserted output: ILP-based voltage scaling with reduced LC set

- T = topological ordering of gates;
- assign low-V<sub>dd</sub>+high-V<sub>th</sub> to all PIs;
- while (T is not empty)
- 4. v = T.pop;
- 5.  $dly(v) = \sum_{k=1}^{4} x_{v,k} \cdot d_{v,k}$ ;
- 6.  $vdd(v) = x_{v,1} + x_{v,3}$ ;
- 7.  $v \leftarrow V_{dd}$ -L+ $V_{th}$ -H;
- $//V_{dd}$  mapping
- 8. if  $(\exists u \in FI(v) | m(e_{u,v}) = 1)$
- v ← V<sub>dd</sub>-H;
- 10. if (vdd(v) > 0)
- 11.  $v \leftarrow V_{dd}$ -H;
- // LC removal
- 12. if  $(\exists u \in FI(v)|u = V_{dd}\text{-H \& }m(e_{u,v}) = 1 \text{ or } u = V_{dd}\text{-L \& }m(e_{u,v}) = 1 \text{ and } v = V_{dd}\text{-L})$
- 13.  $m(e_{u,v}) \leftarrow 0$
- $//V_{th}$  mapping
- 14. if  $(v = V_{dd}$ -H &  $dly(v) < delay(V_{dd}$ -H+ $V_{th}$ -H))
- 15.  $v \leftarrow V_{th}$ -L;
- 16. if  $(v = V_{dd}\text{-L } \& dly(v) < delay(V_{dd}\text{-L}+V_{th}\text{-H}))$
- 17.  $v \leftarrow V_{th}$ -L;



#### Assume only two states







# Gradient Search Algorithm for LC Relaxation









#### Post Refinement

input: retimed and voltage scaled solution output: refined voltage scaling solution

#### // clustering

- 1. perform static timing analysis;
- mark all nodes with positive timing slack;
- form clusters among marked nodes;
- 4. sort clusters based on its size;

#### // main loop

- 5. for (each cluster C)
- while (there is power reduction)
- 7. **for** (each node  $v \in C$ )
- 8. power\_gain(v, slk(v), C);
- 9.  $z = \max \text{ power gain node};$
- 10. commit voltage change for z;
- 11. update slack for downstream nodes of z;



#### **Outline**



- Introduction and Motivation
- Related Work
- Methodology
- Experimental Results
- Conclusions





## **Impact of Retiming on Power**

|       | Retiming + Scaling [Chabini04] |       |                        | min FF. retiming |                      |       |                        |       |
|-------|--------------------------------|-------|------------------------|------------------|----------------------|-------|------------------------|-------|
|       | V <sub>dd</sub> (uW)           |       | $V_{dd} + V_{th} (uW)$ |                  | V <sub>dd</sub> (uW) |       | $V_{dd} + V_{th} (uW)$ |       |
| ckt   | GL                             | GLF   | GL                     | GLF              | GL                   | GLF   | GL                     | GLF   |
| s641  | 295.9                          | 372.1 | 170                    | 246.1            | 316.6                | 361.8 | 187.4                  | 232.6 |
| s713  | 311.1                          | 399.2 | 181.9                  | 269.9            | 330.5                | 375.8 | 199.4                  | 244.6 |
| s820  | 381                            | 404.8 | 299                    | 322.8            | 400.8                | 415   | 296                    | 310.3 |
| s832  | 384                            | 407.8 | 307.4                  | 331.2            | 403.9                | 418.2 | 299.9                  | 314.2 |
| s838  | 383.5                          | 543   | 247.6                  | 407              | 436.9                | 586.9 | 283.6                  | 433.5 |
| s1196 | 567.9                          | 758.3 | 389.3                  | 579.7            | 538.7                | 591   | 381.8                  | 434.1 |
| s1238 | 567                            | 764.6 | 404.8                  | 602.3            | 546.7                | 599.1 | 395.5                  | 447.9 |
| s1488 | 761.8                          | 821.3 | 568.1                  | 627.6            | 765                  | 781.7 | 535.7                  | 552.4 |
| s1494 | 761.4                          | 835.2 | 569.8                  | 643.5            | 765                  | 781.7 | 536.2                  | 552.8 |
| Ratio | -                              | 1     | -                      | 0.76             | -                    | 0.93  | -                      | 0.66  |

GL = Gate Power + LC Power GLF = Gate Power + LC Power + FF Power



# Power Comparison on Different Voltage Scaling Techniques (in uW)

| ckt   | INIT   | CVS <sub>[Usami95]</sub> | LP     | ILP   |  |
|-------|--------|--------------------------|--------|-------|--|
| s641  | 434.3  | 374.5                    | 232.6  | 230.6 |  |
| s713  | 458    | 392.3                    | 244.6  | 243.3 |  |
| s820  | 425.7  | 412.4                    | 310.3  | 309.7 |  |
| s832  | 428.9  | 415.6                    | 314.2  | 312.5 |  |
| s838  | 627.3  | 579.6                    | 433.5  | 428.6 |  |
| s1196 | 646.8  | 616.2                    | 434.1  | 434.1 |  |
| s1238 | 648.4  | 619.1                    | 447.9  | 446.6 |  |
| s1488 | 796.1  | 773.8                    | 552.4  | 551.4 |  |
| s1494 | 795.5  | 773.2                    | 552.8  | 550.5 |  |
| ratio | 1      | 0.94                     | 0.66   | 0.66  |  |
| time  | 28 sec | 29 sec                   | 44 sec | 1 day |  |

INIT = all nodes  $V_{dd}$ -H +  $V_{th}$ -L LP = Linear Programming

CVS= clustered Voltage Scaling ILP = Integer Linear Programming



#### **Outline**



- Introduction and Motivation
- Related Work
- Methodology
- Experimental Results
- Conclusions





#### **Conclusions**

 Power minimization is an important VLSI design issue: both static and dynamic power

We propose a mathematical model to solve power optimization issue while maintain the target clock period

The experiment results show up to 30% power reduction









## Delay and Power for Voltage Scaling

| config                 | delay | dynamic | leakage |
|------------------------|-------|---------|---------|
| High-Vdd/Low-Vth gate  | 1.00  | 1.69    | 0.44    |
| Low-Vdd/Low-Vth gate   | 2.53  | 0.36    | 0.44    |
| High-Vdd/High-Vth gate | 1.24  | 1.69    | 0.058   |
| Low-Vdd/High-Vth gate  | 4.26  | 0.36    | 0.058   |
| level conversion FF    | 11.39 | 5.07    | 1.34    |
| level converter        | 1.77  | 1.03    | 0.25    |





#### **Retiming Algorithm**

FF. edge has weight = clock period \* number of FF.

If Bellman-Ford algorithm has a feasible solution, the target clock period is feasible



 Binary search is used to identify smallest feasible clock period (cycle time)

**Flipflop** 





### **Retiming LP Formulation**

Minimize 
$$\{FI(v) - FO(v)\} \cdot r(v)$$
 (55)

Subject to:

$$r(u) - r(v) \le w(e_{u,v}), \ \forall e_{u,v} \in E \tag{56}$$

$$r(u) - r(v) \le W(u, v), \ \forall D(u, v) > L, \ \forall u, v \in V$$
 (57)







#### **Retiming Formulation**

#### Objective:

Minimize the number of flip-flops (FF.)

#### Minimize $\{FI(v) - FO(v)\} \cdot r(v)$ (53)

Subject to:

$$r(u) - r(v) \le w(e_{u,v}), \ \forall e_{u,v} \in E \quad (54)$$

$$r(u) - r(v) \le W(u, v) - 1, \ \forall D(u, v) > L, \ \forall u, v \in V \quad (55)$$

#### Constraints:

- Num. FF. has to be satisfied r(u) ≤ w(e<sub>u,v</sub>) + r(v)
- Num. FF. on critical paths has to be greater than zero

$$W_{i}(\mathcal{E})$$

0

0

0

0

Cycle Time (L) =2

$$D(1,2) = 2$$

$$D(1,3) = 3$$

$$D(2,3) = 2$$

$$W(1,2) = 0$$

$$W(1,3) = 1$$

$$W(2,3) = 1$$

r(v) is the number of FF. moved from fanout of node v to fanin of node v  $w(e_{u,v})$  is the FF. count on edge u,v,

D(u,v) is the maximum delay on path u,v

W(u,v) is minimum number of FF. on path u,v



## LC







### **LCFF**







# Power Comparison on Different Voltage Scaling Techniques (in uW)

| ckt   | INIT   | CVS <sub>[Usami95]</sub> | MVVS<br>[Srivastava04] | LP     | ILP   |
|-------|--------|--------------------------|------------------------|--------|-------|
| s641  | 434.3  | 374.5                    | 253.9                  | 232.6  | 230.6 |
| s713  | 458    | 392.3                    | 268.7                  | 244.6  | 243.3 |
| s820  | 425.7  | 412.4                    | 327.7                  | 310.3  | 309.7 |
| s832  | 428.9  | 415.6                    | 331.6                  | 314.2  | 312.5 |
| s838  | 627.3  | 579.6                    | 440.1                  | 433.5  | 428.6 |
| s1196 | 646.8  | 616.2                    | 439.0                  | 434.1  | 434.1 |
| s1238 | 648.4  | 619.1                    | 454.1                  | 447.9  | 446.6 |
| s1488 | 796.1  | 773.8                    | 577.1                  | 552.4  | 551.4 |
| s1494 | 795.5  | 773.2                    | 575.0                  | 552.8  | 550.5 |
| ratio | 1      | 0.94                     | 0.69                   | 0.66   | 0.66  |
| time  | 28 sec | 29 sec                   | 35 sec                 | 44 sec | 1 day |